Theoretical insights into the optimization landscape of over-parameterized shallow neural networks

نویسندگان

  • Mahdi Soltanolkotabi
  • Adel Javanmard
  • Jason D. Lee
چکیده

In this paper we study the problem of learning a shallow artificial neural network that best fits a training data set. We study this problem in the over-parameterized regime where the number of observations are fewer than the number of parameters in the model. We show that with quadratic activations the optimization landscape of training such shallow neural networks has certain favorable characteristics that allow globally optimal models to be found efficiently using a variety of local search heuristics. This result holds for an arbitrary training data of input/output pairs. For differentiable activation functions we also show that gradient descent, when suitably initialized, converges at a linear rate to a globally optimal model. This result focuses on a realizable model where the inputs are chosen i.i.d. from a Gaussian distribution and the labels are generated according to planted weight coefficients. Dedicated to the memory of Maryam Mirzakhani.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Power of Over-parametrization in Neural Networks with Quadratic Activation

We provide new theoretical insights on why over-parametrization is effective in learning neural networks. For a k hidden node shallow network with quadratic activation and n training data points, we show as long as k ≥ √ 2n, over-parametrization enables local search algorithms to find a globally optimal solution for general smooth and convex loss functions. Further, despite that the number of p...

متن کامل

SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data

Neural networks exhibit good generalization behavior in the over-parameterized regime, where the number of network parameters exceeds the number of observations. Nonetheless, current generalization bounds for neural networks fail to explain this phenomenon. In an attempt to bridge this gap, we study the problem of learning a two-layer over-parameterized neural network, when the data is generate...

متن کامل

PROJECTED DYNAMICAL SYSTEMS AND OPTIMIZATION PROBLEMS

We establish a relationship between general constrained pseudoconvex optimization problems and globally projected dynamical systems. A corresponding novel neural network model, which is globally convergent and stable in the sense of Lyapunov, is proposed. Both theoretical and numerical approaches are considered. Numerical simulations for three constrained nonlinear optimization problems a...

متن کامل

Topology and Geometry of Half-Rectified Network Optimization

The loss surface of deep neural networks has recently attracted interest in the optimization and machine learning communities as a prime example of high-dimensional non-convex problem. Some insights were recently gained using spin glass models and mean-field approximations, but at the expense of strongly simplifying the nonlinear nature of the model. In this work, we do not make any such assump...

متن کامل

Application of Wavelet Neural Networks for Improving of Ionospheric Tomography Reconstruction over Iran

In this paper, a new method of ionospheric tomography is developed and evaluated based on the neural networks (NN). This new method is named ITNN. In this method, wavelet neural network (WNN) with particle swarm optimization (PSO) training algorithm is used to solve some of the ionospheric tomography problems. The results of ITNN method are compared with the residual minimization training neura...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1707.04926  شماره 

صفحات  -

تاریخ انتشار 2017